Predictive Text Entry for Agglutinative Languages Using Unsupervised Morphological Segmentation

نویسندگان

  • Miikka Silfverberg
  • Krister Lindén
  • Mirka Hyvärinen
چکیده

Systems for predictive text entry on ambiguous keyboards typically rely on dictionaries with word frequencies which are used to suggest the most likely words matching user input. This approach is insufficient for agglutinative languages, where morphological phenomena increase the rate of out-of-vocabulary words. We propose a method for text entry, which circumvents the problem of out-of-vocabulary words, by replacing the dictionary with a Markov chain on morph sequences combined with a third order hidden Markov model (HMM) mapping key sequences to letter sequences and phonological constraints for pruning suggestion lists. We evaluate our method by constructing text entry systems for Finnish and Turkish and comparing our systems with published text entry systems and the text entry systems of three commercially available mobile phones. Measured using the keystrokes per character ratio (KPC) [8], we achieve superior results. For training, we use corpora, which are segmented using unsupervised morphological segmentation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predictive Text Entry for Agglutinative Languages Using Morphological Segmentation and Phonological Restrictions

Systems for predictive text entry on ambiguous keyboards typically rely on dictionaries with word frequencies which are used to suggest the most likely words matching user input. This approach is insufficient for agglutinative languages, where morphological phenomena increase the rate of out-of-vocabulary words. We propose a method for text entry, which circumvents the problem of out-of-vocabul...

متن کامل

The Study of Effect of Length in Morphological Segmentation of Agglutinative Languages

Morph length is one of the indicative feature that helps learning the morphology of languages, in particular agglutinative languages. In this paper, we introduce a simple unsupervised model for morphological segmentation and study how the knowledge of morph length affect the performance of the segmentation task under the Bayesian framework. The model is based on (Goldwater et al., 2006) unigram...

متن کامل

Joint Bayesian Morphology learning for Dravidian languages

In this paper a methodology for learning the complex agglutinative morphology of some Indian languages using Adaptor Grammars and morphology rules is presented. Adaptor grammars are a compositional Bayesian framework for grammatical inference, where we define a morphological grammar for agglutinative languages and morphological boundaries are inferred from a plain text corpus. Once morphologica...

متن کامل

Building Morphological Chains for Agglutinative Languages

In this paper, we build morphological chains for agglutinative languages by using a log linear model for the morphological segmentation task. The model is based on the unsupervised morphological segmentation system called MorphoChains [1]. We extend MorphoChains log linear model by expanding the candidate space recursively to cover more split points for agglutinative languages such as Turkish, ...

متن کامل

Evaluating an Agglutinative Segmentation Model for ParaMor

This paper describes and evaluates a modification to the segmentation model used in the unsupervised morphology induction system, ParaMor. Our improved segmentation model permits multiple morpheme boundaries in a single word. To prepare ParaMor to effectively apply the new agglutinative segmentation model, two heuristics improve ParaMor’s precision. These precision-enhancing heuristics are adap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012